DOCUMENT RESUME 



ED 409 328 



TM 026 644 



AUTHOR 

TITLE 

PUB DATE 
NOTE 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Roberts, James S.; And Others 

Comparative Validity of the Likert and Thurstone Approaches 
to Attitude Measurement. 

Mar 97 ^ 

27p . ; Paper presented at the Annual Meeting of the American 
Educational Research Association (Chicago, IL, March 24-28, 
1997) . 

Reports - Research (143) -- Speeches/Meeting Papers (150) 

MF01/PC02 Plus Postage. 

* Attitude Measures; Attitudes; Comparative Analysis; *Error 
of Measurement; Item Response Theory; *Likert Scales; 
♦Measurement Techniques; Research Methodology; *Validity 
*Thurstone Scales 



ABSTRACT 



Graded or binary disagree-agree responses to attitude 
statements are often collected for the purpose of attitude measurement. The 
empirical characteristics of these responses will generally be inconsistent 
with the analytical logic that forms the basis of the Likert attitude 
measurement technique (R. Likert, 1932) . As a consequence, the Likert 
procedure can lead to invalid measurement of a select group of individuals. 
Likert attitude estimates can substantially misrepresent individuals with the 
most negative and most positive attitudes so that they appear to have more 
moderate opinions. In contrast, the Thurstone attitude measurement procedure 
(L. L. Thurstone, 1928) is generally more consistent with empirical 
characteristics of disagree-agree responses, and because of this superior 
consistency, Thurstone attitude scores do not suffer from this type of 
degraded validity. This paper highlights theoretical differences between the 
Likert and Thurstone approaches to attitude measurement and demonstrates how 
such differences can lead to discrepant attitude estimates for individuals 
with the most extreme opinions . Both simulated data and real data on attitude 
toward abortion are used to demonstrate this discrepancy. The results suggest 
that attitude researchers should, at the very least, devote more attention to 
the empirical response characteristics of items on a Likert attitude 
questionnaire. At most, these results suggest that other methods, like the 
Thurstone technique or one of its recently developed item response theory 
counterparts, should be used to derive attitude estimates from disagree-agree 
responses. (Contains 1 table, 12 figures, and 36 references.) (Author/SLD) 



***************************************************************************** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



f OcZ6)6> </y ED 409 328 




PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY ' 



Tames Roberts 



originating it. 



□ Minor changes have been made to 
improve reproduction quality. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 




official OERI position or policy. 



Comparative validity of the Likert and Thurstone approaches to attitude measurement 1 . 



‘This research was conducted while the first author was a postdoctoral fellow in the 
Division of Statistics and Psychometric Research at Educational Testing Service under the 
mentorship of Dr. John Donoghue. We would like to thank Dr. Donoghue for his helpful 
comments about this work. Correspondence should be send to James S. Roberts, Center for 
Drug and Alcohol Programs, Medical University of South Carolina, 171 Ashley Avenue, IOP - 4 
North, Charleston, SC 29425-0742, e-mail: james_roberts@smtpgw.musc.edu. 



James S. Roberts 
Alcohol Research Center 
Center for Drug and Alcohol Programs 
Medical University of South Carolina 



James E. Laughlin & Douglas H. Wedell 
Department of Psychology 
University of South Carolina 



\ ** 

TRJC 



2 



Abstract 



Graded or binary disagree-agree responses to attitude statements are often collected for the 
purpose of attitude measurement. The empirical characteristics of these responses will generally 
be inconsistent with the analytical logic that forms the basis of the Likert attitude measurement 
technique. As a consequence, the Likert procedure can lead to invalid measurement of a select 
group of individuals - those individuals with the most extreme attitudes. Specifically, Likert 
attitude estimates can substantially misrepresent individuals with the most negative and most 
positive attitudes so that they appear to have more moderate opinions. In contrast, the Thurstone 
attitude measurement procedure is generally more consistent with empirical characteristics of 
disagree-agree responses, and because of this superior consistency, Thurstone attitude scores do 
not suffer from this type of degraded validity. This paper highlights the theoretical differences 
between the Likert and Thurstone approaches to attitude measurement, and demonstrates how 
such differences can lead to discrepant attitude estimates for individuals with the most extreme 
opinions. Both simulated data and real data on attitude toward abortion are used to demonstrate 
this discrepancy. The results suggest that attitude researchers should, at the very least, devote 
more attention to the empirical response characteristics of items on a Likert attitude 
questionnaire. At most, these results suggest that other methods, like the Thurstone technique or 
one of its recently developed item response theory counterparts, should be used to derive attitude 
estimates from disagree-agree responses. 



1 




3 



Introductory texts often portray the Thurstone (1928) and the Likert (1932) approaches to 
attitude measurement as though both methods provide equally valid measures of attitude when 
individuals respond to a set of questionnaire items using a (binary or graded) disagree-agree 
response scale (Petty and Cacioppo, 1981; Mueller, 1986). This overly simplistic portrayal is 
fostered by studies which indicate that Likert and Thurstone attitude scores are typically 
correlated to a least a moderate degree (.60 < r < .95), regardless of whether responses to the 
same set of items are scored with the two procedures (Ferguson, 1941 ; Likert, 1932; Likert, 
Roslow & Murphy, 1934) or responses to independently constructed Likert and Thurstone 
questionnaires are compared (Edwards & Kenney, 1946; Flamer, 1983; Jaccard, Weber & 
Lundmark, 1975; Likert, 1932; Rhoads & Landy, 1973). Given these results, researchers have 
usually differentiated the two methods using other measurement criteria such as reliability and 
efficiency of scale construction. The general finding has been that Likert attitude scores exhibit 
either higher composite reliability (i.e., corrected split-half or corrected parallel forms reliability) 
or higher test-retest reliability as compared to Thurstone attitude scores (Seiler & Hough, 1970). 
Additionally, the general perception is that the Likert technique is easier and more efficient to 
carry out than the Thurstone technique, primarily because the former method does not require a 
judgment group to produce item scale values (Barclay & Weaver, 1962; Edwards & Kenny, 1946; 
Mueller, 1986). These two features may account for the relatively superior popularity of the 
Likert procedure for attitude measurement (Petty & Cacioppo, 1981). 

Although previous studies have suggested that Likert and Thurstone attitude scores will be 
linearly related to at least a moderate extent, they do not convincingly demonstrate that the two 
scores both measure the latent attitude with the same degree of validity. The relationship between 
Likert scores and true attitudes could still differ systematically from the corresponding 
relationship found for Thurstone scores whenever the correlation between the two types of scores 
is only moderately high. Therefore, distinctions between the two procedures might still be made 
with regard to their respective validities. 

We argue against the idea that the Thurstone and Likert methods generally yield comparably 
valid estimates of true attitudes, and for the idea that the methods should not be treated as equally 
applicable in traditional attitude measurement situations. Instead, the appropriate application of 
either method depends on the item response process that subjects use when endorsing attitude 
items. We also argue that in those traditional situations where subjects respond to attitude items 
using a graded or binary disagree-agree response scale, the empirical response process generally 
favors the use of the Thurstone procedure as opposed to the Likert procedure. Moreover, we use 
both simulated and real data to illustrate how the application of the Likert procedure in these 
situations can yield invalid measures for individuals with the most extreme attitudes. In contrast, 
the validity of the Thurstone procedure does not degenerate in these situations. 

Review of the Thurstone and Likert Approaches 

The Thurstone Approach 

The classic Thurstone approach to attitude scale construction involves two main stages. In 



2 




4 



the first stage, a large number of attitude statements are written to span the entire range of 
possible opinions, and these items are scaled with regard to their unfavorability or favorability 
towards a given attitude object. There are several Thurstonian techniques for scaling attitude 
items including pairwise comparisons (Thurstone, 1927abc), equal appearing intervals (Thurstone 
& Chave, 1929), and successive intervals (Safir, 1937). All of these methods require a group of 
subjects to make favorability judgments about each item (or each pair of items), and all three 
methods yield a set of item scale values which indicate how favorably or unfavorably the item’s 
sentiment reflects the attitude object. Those items with scale values having large standard errors 
are discarded from the pool of items under consideration. In the second stage, subjects are asked 
to indicate which attitude statements they agree with. Attitude estimates are developed for each 
individual by computing the mean (or median) scale value associated with endorsed item, and then 
these attitude estimates are used to develop empirical operating characteristic curves for each 
item. The final Thurstone scale is limited to ’’relevant” items with scale values that are more or 
less uniformly distributed across the attitude continuum. A relevant item is one that attracts 
endorsements primarily from subjects whose attitudes are comparable to the sentiment expressed 
by the item. 




True Attitude 

Figure 1. Theoretical item characteristic curves associated with an unfolding model. From (upper) left to right, the 
curves correspond to a moderately negative item, a neutral item and a moderately positive item. 



3 




5 



